Golang Job: Senior Site Reliability Engineer

Job added on

Company

Workday
Ireland

Location

Remote Position
(From Everywhere/No Office Location)

Job type

Full-Time

Golang Job Details

Do what you love. Love what you do.

At Workday, we help the world’s largest organizations adapt to what’s next by bringing finance, HR, and planning into a single enterprise cloud. We work hard, and we’re serious about what we do. But we like to have fun, too. We put people first, celebrate diversity, drive innovation, and do good in the communities where we live and work.
About the Team
Workday is building a new Site Reliability Engineering team (SRE) responsible for deploying, operating and supporting a state-of-the-art cloud native service platform in production. The platform is built using Cloud Native technologies (CNCF), on a foundation of Kubernetes in Public Cloud environments. This provides a secure platform on which Workday service teams, and Platform development teams can build and test their pre-release code, through deployment to production on a continuous basis.

The primary function of the SRE team is to ensure the reliability of the platform, to reduce operational load, and to scale sustainably in line with business growth. All SRE responsibilities and team growth will be supported by Service Level Objectives (SLOs).

Engineers from this team have shared their experiences at Cloud Native conferences, including KubeCon.
About the Role
  • The role entails a hybrid of software engineering and operations, with an emphasis on reducing operational toil. Automation work is planned by following scrum practices with two week sprints.
  • SRE engineers bring a software centric approach to improving operational efficiency.
  • Build software systems which automate multistage deployment of tested changes to staging and all production environments
  • SRE team owns the operations of the platform, including production incident response and automated runbook enhancements to reduce on-call toil
  • SRE team owns the deployment of new environments by enhancing deployment automation and streamlining dependency management
  • SRE team defines effective SLIs and ensures that SLOs are achieved through building an extendable Observability architecture, runbook automation, and establishing processes.
  • Improve platform reliability, observability and overall customer satisfaction.
  • Partner with platform teams to design and pilot SRE standards for their respective services to meet. Define benchmarks and automation to qualify services to move to production environments.
  • The responsibilities and reach of this SRE team will scale sustainably, in line with business growth
About You
Basic Qualifications
  • 3+ years of SRE experience
  • Extensive engineering experience with Linux
  • Hands on experience and knowledge of Kubernetes primitives such as pods, deployments, RBAC, and statefulsets.
  • Proficiency with at least one programming language, preferably Golang (Go)
  • Understanding of software development best practices such as code management, CI/CD
  • Academic qualification in Computer Science or related field and 3+ years relevant experience
Other Qualifications
  • Skills and enthusiasm to operate, maintain, support and sustain the platform
  • Obsessive automator, with a track record of referenceable examples
  • A passion for full stack debugging and diagnosing problems across configuration, Linux Operating Systems, and on the network.
  • Knowledge of extending Kubernetes is an advantage (eg: CRDs/Operators/Controllers)
  • Can work independently and with the mindset that everything can be automated
  • Excited by working in a fast-paced environment
  • Experience collaborating with cross functional global and remote teams with diverse backgrounds
  • Excellent documentation skills
  • Ideally, you have knowledge and experience of at least one public cloud platform such as AWS, GCP, Azure etc.
Are you being referred to one of our roles? If so, ask your connection at Workday about our Employee Referral process!